1 Introduction

12 organoid lines were screened with about 70 compounds (5 concentrations) of the clinical cancer panel. After 7 days total (4 days of treatment) the organoids we lysed and a ctg assay was performed. The experiment was conducted in 2 replciates.

1.1 Load packages and data

Load necessary packages for data analysis

library(tidyverse)
# library(readr) library(dplyr) library(tidyr)
# library(ggplot2) library(corrr)
library(pheatmap)
# library(gridExtra) library(EBImage) #cite library(HTSvis)
# #cite
library(PharmacoGx)  #cite
library(growthcurve)  #cite
# library(jsonlite) library(httr) library(openxlsx)
library(preprocessCore)  #cite
library(stringr)
library(GRmetrics)
# library(scales) library(ggpubr)
library(readxl)
library(ggsignif)
library(drc)

Load clinical data. The primary location of tumors is divided into three sights.

#import clinical data
cohort <- read_excel("~/promise/local_data/clinical_data/Clin_Data_Basic-cohort.xlsx") %>%
  separate(ID, c("p", "no.line"), sep = 1) %>%
  mutate(line = paste0("D", no.line, "T01")) %>% #only works if there were no re-biopsies
  dplyr::select(-p, -no.line)

mutations <- read_excel("~/PowerFolders/2017-08 PDO [Johannes Betge Niklas Rintdorff]/Tables/S3_Mutations_Organoids.xlsx")
mutations_freq <- read_excel("~/promise/local_data/sequencing/171114_Mutations_Organoids.xlsx")

#mutation matrix based on thresholded AF
mutations_contig <- mutations %>% #mutation_matrix <-
  dplyr::select(sample, GENE) %>%
  mutate(sample = paste0("D", str_sub(sample, 2, length(sample)), "01")) %>%
  dplyr::rename(line = sample) %>%
  mutate(status = "mutated") %>%
  complete(line, GENE, fill = list(status = "wildtype")) %>%
  distinct(line, GENE, status, .keep_all = TRUE) %>%
  #group_by(line) %>%
  spread(GENE, status) %>%
  mutate(ras = if_else(KRAS == "mutated" | NRAS == "mutated", "mutated", "wildtype"))

#mutation matrix with AF
mutations_freq_contig <- mutations_freq %>%
  dplyr::select(sample, GENE, AF) %>% 
  mutate(sample = paste0("D", str_sub(sample, 2, length(sample)), "01")) %>%
  dplyr::rename(line = sample) %>%
  complete(line, GENE, fill = list(AF = 0)) %>%
  distinct(line, GENE, .keep_all = TRUE) %>%
  #group_by(line) %>%
  spread(GENE, AF)

#define RAS mutation status
ras <- mutations %>% 
  group_by(sample) %>%
  summarise(ras = grepl(pattern = "RAS", GENE) %>% max()) %>%
  mutate(ras = if_else(ras == 1, "mutant", "wildtype"),
         line = paste0("D", str_sub(sample, 2, length(sample)), "01")) %>%
  dplyr::select(-sample)

cohort_short <- cohort %>%
  #dplyr::select(line, Location) %>%
  dplyr::select(line, age, Location, `T`, N, M, Stage, G) %>%
  mutate(Patient = line) %>%
  dplyr::select(-line) 

#Set locations
cohort_short$Location[cohort_short$Location == "Asc"] <- "Right"
cohort_short$Location[cohort_short$Location == "Sigm"] <- "Left"

cohort_short <- cohort_short %>%
  mutate(Location = factor(Location, levels = c("Right", "Left", "Rectum")))

2 Preparation

2.1 Normalize CTG photon counts

2.1.1 Validation of different normalization strategies

Appropriate normalization of CTG photon counts is key to ensure robust quantification of drug response. Many different strategies are available to remove potential technical bias from the data and some of them might be more appropriate than others depending on the kinds of bias present in the data. Hence, we examine carefully the kinds of technical bias present in our data set and employ different normalization strategies in order to correct these. We assess and compare performance of these different strategies by looking at various controls.

2.1.2 Assessment of technical bias present in the data

## read ctg data files
ctg_data <- lapply(list.files("~/promise/rsync_data/ctg_data/HC1092/", 
    full.names = T), function(f) read_tsv(f, col_names = F, col_types = "cci") %>% 
    `colnames<-`(c("screen", "well", "pcount"))) %>% bind_rows()

For now we load again the plate annotation and merge that to the data to know which drug at which concentration went where.

## read excel file, convert to tibble
plate_anno <- tbl_df(read_excel("~/promise/rsync_data/layouts/raw_layout_files/Clin_Cancer_Panel_V170511.xlsx"))
## rename drug and well columns
colnames(plate_anno)[c(1, 6)] <- c("drug", "well")
## join with ctg data
ctg_data <- ctg_data %>% inner_join(plate_anno) %>% extract(well, 
    c("row", "col"), regex = "([A-P])(\\d{2})", remove = F) %>% 
    mutate(row = match(row, LETTERS), col = as.integer(col))

There was a technical issue with one of the plates. Wells have to be flagged accordingly.

ctg_data <- ctg_data %>% mutate(pcount = ifelse(screen == "170502_NR_M2_D004T01P006L08" & 
    (row %in% seq(1, 15, by = 2)) & (col %in% 1:13), NA, pcount))

Now we can get an idea of what the data looks like. First we look at only one plate at a time and look at the photon counts of DMSO controls to see how they vary and if there might be a positional bias that needs correction.

DMSO controls are scattered across the plate and there is no obvious spacial bias with the exception, maybe, of the control at the bottom right of the plate. However, some of the controls clearly failed and have to be flagged form the analysis. What is concerning is that these controls are all in the same area. Maybe plotting the plate as a whole reveals a spacial bias across all drugs.

# p %>% filter(screen == '170502_NR_M3_D007T01P006L08') %>%
# dplyr::select(row, col, pcount) %>% spread(col, pcount) %>%
# dplyr::select(-row) %>% as.matrix() %>%
# pheatmap(cluster_rows=F, cluster_cols=F)
ctg_data %>% group_by(row, col) %>% summarise(pcount = sum(pcount, 
    na.rm = TRUE)) %>% ungroup() %>% dplyr::select(row, col, 
    pcount) %>% spread(col, pcount) %>% dplyr::select(-row) %>% 
    as.matrix() %>% pheatmap(cluster_rows = F, cluster_cols = F)

I would say that there is definitely a spacial bias where photon counts get higher towards the bottom and the edges of the plate perhaps.

## scatter plot with robust loess fit for rows.
ctg_data %>% ggplot(aes(row, pcount)) + geom_point() + geom_smooth(method.args = list(family = "symmetric")) + 
    facet_wrap(~screen) + theme_classic() + ggtitle("Spatial effects of rows")

## scatter plot with robust loess fit for columns
ctg_data %>% ggplot(aes(col, pcount)) + geom_point() + geom_smooth(method.args = list(family = "symmetric")) + 
    facet_wrap(~screen) + theme_classic() + ggtitle("Spatial effects of columns")

2.1.3 Normlization of photon count data by plate DMSO controls

Looking at above plots this seems to be true for rowwise and also some columnwise effects. Hence, for spacial bias we apply a loess-normalization to the data.

## split data by plate and apply loess normalization
ctg_loess <- ctg_data %>% select(row, col, pcount, screen) %>% 
    # remove first plate of screen which had pipetting errors
drop_na() %>% split(.$screen) %>% lapply(function(s) {
    ## loess fit. family is 'symmetric' to be robust to outliers
    fit <- loess(pcount ~ row + col, data = s, family = "symmetric")
    ## apply normalization
    tibble(norm_fac = fit$fitted) %>% cbind(s, .) %>% mutate(pcount_norm = pcount - 
        (norm_fac - median(norm_fac)))
}) %>% bind_rows() %>% full_join(ctg_data)
## plot row vs photon count of normalized values
ctg_loess %>% ggplot(aes(row, pcount_norm)) + geom_point() + 
    geom_smooth(method.args = list(family = "symmetric")) + theme_classic() + 
    facet_wrap(~screen)

Next we can investigate plate specific biases and if correcting for these would actually change anything compared to just normalizing to the DMSO controls on every plate individually. We compare the effects by looking at the drug Bortezomib which we expect to be toxic regardless of the organoid line.

Of note, the 32 DMSO controls yield a reference median which is, across all plates, more robust and greater in relative size than a plate-wise median. Hence the DMSO control dependent median should be used for calibration of plates.

## no cross-plate normalization
ctg_loess %>% left_join(ctg_loess %>% filter(drug == "DMSO") %>% 
    group_by(screen) %>% summarise(ctrl = median(pcount_norm, 
    na.rm = T)) %>% ungroup()) %>% group_by(screen) %>% mutate(rv = pcount_norm/ctrl) %>% 
    ungroup() %>% filter(drug %in% c("DMSO", "Bortezomib")) %>% 
    ggplot(aes(screen, rv, colour = drug)) + geom_point() + facet_wrap(~rack.concentration) + 
    theme_classic() + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), 
    axis.ticks.x = element_blank())

Without cross-plate normalization we can see that the drug is toxic, especially at higher concentrations, but we can definitely see some variation. I’m not sure at this point whether this could be a real effect or whether there is variation for different reasons (such as varying growth rate) that is better normalized away. Let’s see if a cross-plate normalization changes anything.

## median photon count across all dmso controls of all plates
all_dmso <- ctg_loess %>% filter(drug == "DMSO") %>% .$pcount_norm %>% 
    median(na.rm = T)
## cross-plate normalization
ctg_pn <- ctg_loess %>% left_join(ctg_loess %>% filter(drug == 
    "DMSO") %>% group_by(screen) %>% summarise(ctrl = median(pcount_norm, 
    na.rm = T)) %>% ungroup()) %>% group_by(screen) %>% mutate(pcn = pcount_norm * 
    (all_dmso/ctrl)) %>% ungroup()
## plot bortezomib
ctg_pn %>% dplyr::select(-ctrl) %>% left_join(ctg_pn %>% filter(drug == 
    "DMSO") %>% group_by(screen) %>% summarise(ctrl = median(pcn, 
    na.rm = T)) %>% ungroup()) %>% group_by(screen) %>% mutate(rv = pcn/ctrl) %>% 
    ungroup() %>% filter(drug %in% c("Bortezomib", "DMSO")) %>% 
    ggplot(aes(screen, rv, colour = drug)) + geom_point() + facet_wrap(~rack.concentration) + 
    theme_classic() + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), 
    axis.ticks.x = element_blank())

So it seems that a cross plate normalization doesn’t actually make a difference, indicating that the growth rates might be affecting the variation in phenotype mostly. Let’s see if this could be adjusted using a quantile normalization.

## generate a matrix with a column per sample and a row per
## well
ctg_mat <- ctg_loess %>% left_join(ctg_loess %>% filter(drug == 
    "DMSO") %>% group_by(screen) %>% summarise(ctrl = median(pcount_norm, 
    na.rm = T)) %>% ungroup()) %>% group_by(screen) %>% mutate(pcount = pcount/ctrl) %>% 
    ungroup() %>% dplyr::select(screen, well, pcount) %>% spread(screen, 
    pcount) %>% data.frame() %>% `rownames<-`(.$well) %>% dplyr::select(-well)
## boxplot
boxplot(ctg_mat)

## remember rownames and colnames
rn <- rownames(ctg_mat)
cn <- colnames(ctg_mat)
## quantile normalize
qnorm_mat <- ctg_mat %>% as.matrix() %>% normalize.quantiles()
rownames(qnorm_mat) <- rn
colnames(qnorm_mat) <- cn
## boxplot of normalized matrix
boxplot(qnorm_mat)

## bring back together with the meta data
ctg_qnorm <- ctg_loess %>% dplyr::select(-c(pcount, pcount_norm, 
    norm_fac)) %>% inner_join(qnorm_mat %>% data.frame() %>% 
    mutate(well = rownames(.)) %>% tbl_df %>% gather(screen, 
    pc_norm, -well) %>% mutate(screen = gsub("^X", "", screen)))
## look at bortezomib
ctg_qnorm %>% filter(drug %in% c("Bortezomib", "DMSO", "Gefitinib")) %>% 
    ggplot(aes(screen, pc_norm, colour = drug)) + geom_point() + 
    facet_wrap(~rack.concentration) + theme_classic() + theme(axis.title.x = element_blank(), 
    axis.text.x = element_blank(), axis.ticks.x = element_blank())

This looks reasonable to me. As a final check we can look at the 5-10 overall most toxic drugs and checkif there are differences between the lines (expecting that there are none).

## select 5 most toxic drugs
dtox <- ctg_qnorm %>% group_by(drug) %>% summarise(medpc = median(pc_norm, 
    na.rm = T)) %>% ungroup() %>% arrange(medpc) %>% .$drug %>% 
    .[1:5]
## boxplot of these drugs across lines and concentrations
ctg_qnorm %>% filter(drug %in% dtox) %>% ggplot(aes(screen, pc_norm)) + 
    geom_boxplot() + facet_wrap(~rack.concentration) + theme_classic() + 
    theme(axis.title.x = element_blank(), axis.text.x = element_blank(), 
        axis.ticks.x = element_blank())

I do not see anything that seems to really be strikingly off so I think it’s ok to precede and calculate IC50/AUC values. Viability values of > 100 % do not really make sense as we wouldn’t expect drug treatment to result in a gain of viability phenotype. Hence, we clip the values at 1 to achieve better AUC/IC50 models.

ctg_qnorm <- ctg_qnorm %>% mutate(pc_norm = ifelse(pc_norm > 
    1, 1, pc_norm))
## histogram of normalized viability
hist(ctg_qnorm %>% .$pc_norm)

2.2 Illustration of dose response curves

I first want to gain an overview of treatment response profiles with the quantile normalized dataset. Therefore I plot some example compounds I am interested in for our case study patient. Using the PharamcoGx package these nice dose-response curves can be fitted.

ctg_qnorm_tidy <- ctg_qnorm %>% ## separate annotation of screen name
separate(screen, c("date", "operator", "mithras", "full_barcode"), 
    remove = FALSE) %>% separate(full_barcode, c("line", "plate", 
    "library"), remove = FALSE, sep = c(7, 11)) %>% group_by(line, 
    drug) %>% arrange(rack.concentration)
save(ctg_qnorm_tidy, file = "ctg_qnorm_tidy.Rdata")
coi <- c("Gefitinib")
loi <- c("D027T01")
list <- ctg_qnorm_tidy %>% full_join(., ras) %>% filter(drug %in% 
    coi & line %in% loi) %>% mutate(concentration = rack.concentration %>% 
    as.numeric()) %>% dplyr::select(line, concentration, pc_norm, 
    drug, ras) %>% ungroup() %>% split(., .$line)
## Joining, by = "line"
c.list <- lapply(list, function(df) {
    df$concentration
})
v.list <- lapply(list, function(df) {
    df$pc_norm
})
PharmacoGx::drugDoseResponseCurve(concentrations = c.list, viabilities = v.list, 
    plot.type = "Fitted", viability_as_pct = FALSE)

Based on question asked on StackOverflow, I build a function to plot dose-response curves using the tidyverse and drc for curve fitting. Link: https://stackoverflow.com/questions/36780357/plotting-dose-response-curves-with-ggplot2-and-drc

tidy_drc_fit <- function(df) {
    # predictions and confidence intervals.
    org.fits <- expand.grid(conc = exp(seq(log(1), log(0.0016), 
        length = 1000)))
    fits <- drm(data = df, pc_norm ~ conc, fct = LL.4(), na.action = na.omit) %>% 
        predict(., newdata = org.fits, interval = "confidence")
    org.fits %>% mutate(p = fits[, 1], pmin = fits[, 2], pmax = fits[, 
        3], line = df$line %>% unique(), drug = df$drug %>% unique()) %>% 
        full_join(df) %>% return(.)
}
coi <- c("Gefitinib")
loi <- c("D027T01")
tmp <- ctg_qnorm_tidy %>% mutate(conc = rack.concentration %>% 
    as.numeric()) %>% dplyr::select(line, drug, pc_norm, conc) %>% 
    filter(line != "D030T01", drug %in% coi) %>% group_by(drug, 
    line) %>% do(., tidy_drc_fit(.))
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
## Joining, by = c("conc", "line", "drug")
tmp %>% full_join(mutations_contig) %>% ggplot(., aes(color = ras, 
    group = line)) + geom_point(aes(x = conc, y = pc_norm), size = 1, 
    shape = 1) + # geom_ribbon(data=tmp, aes(x=conc, y=p, ymin=pmin,
# ymax=pmax), alpha=0.2) +
geom_line(aes(x = conc, y = p)) + coord_trans(x = "log") + scale_y_continuous(limits = c(0, 
    1)) + scale_x_continuous(breaks = c(ctg_qnorm_tidy$rack.concentration %>% 
    unique() %>% as.numeric())) + theme_classic() + ylab("Photon count normalized to control") + 
    xlab("Concentration factor") + theme_classic()
## Joining, by = "line"

3 Quantification

3.1 Quantification of drug response

Different ways of quantifying drug response might yield different response. We first use the PharmacoGx package to calculate AUCs based on both the actual data and the fitted curve (I would assume the first to be more reasonable).

## takes a while
auc_pgx <- ctg_qnorm %>% ## separate annotation of screen name
separate(screen, c("date", "operator", "mithras", "full_barcode"), 
    remove = FALSE) %>% separate(full_barcode, c("line", "plate", 
    "library"), remove = FALSE, sep = c(7, 11)) %>% group_by(screen, 
    drug) %>% arrange(rack.concentration) %>% ## calculate auc both for fitted curve and actual values
mutate(s_AUC_fit = computeAUC(rack.concentration, pc_norm, verbose = F, 
    viability_as_pct = F, area.type = "Fitted"), s_AUC_actual = computeAUC(rack.concentration, 
    pc_norm, verbose = F, viability_as_pct = F, area.type = "Actual")) %>% 
    group_by(line, drug) %>% arrange(rack.concentration) %>% 
    ## calculate auc both for fitted curve and actual values
mutate(l_AUC_fit = computeAUC(rack.concentration, pc_norm, verbose = F, 
    viability_as_pct = F, area.type = "Fitted"), l_AUC_actual = computeAUC(rack.concentration, 
    pc_norm, verbose = F, viability_as_pct = F, area.type = "Actual")) %>% 
    ungroup()
save(auc_pgx, file = "auc_pgx_2.Rdata")

We plot a scatter plot to see if both ways of calculating the AUC agree.

load("~/promise/scripts/auc_pgx_2.Rdata")
# load('~/promise/auc_pgx.Rdata')
auc_pgx %>% ggplot(aes(s_AUC_fit, s_AUC_actual)) + geom_point() + 
    theme_classic()

auc_pgx %>% ggplot(aes(l_AUC_fit, l_AUC_actual)) + geom_point() + 
    theme_classic()

It seems that they agree very well, with the exception of some outliers generated by cases with NAs. Looking through the results manually it seems that the values estimated by the fitted curve are more reasonable. It would be interesting to compare these results with the results of a different AUC quantification, for example using a multilevel model.

3.2 Rank based testing of gene~drug interactions

coi <- c("Gefitinib", "Erlotinib")
loi <- c("D007T01")
auc_pgx %>% full_join(., mutations_contig) %>% filter(drug %in% 
    coi) %>% group_by(screen, drug, line, KRAS) %>% summarise(auc = mean(s_AUC_actual)) %>% 
    ggplot(aes(x = KRAS, y = auc)) + geom_boxplot() + geom_jitter(aes(color = line), 
    width = 0.1) + facet_wrap(~drug) + geom_signif(comparisons = list(c("wildtype", 
    "mutated")), test = "wilcox.test") + theme_classic()
## Joining, by = "line"

Even with dynamic adjustment of AF cutoffs, no clear KRAS - EGFRi interaction can be measured.

coi <- c("Gefitinib", "Erlotinib")
auc_pgx %>% full_join(., mutations_freq_contig) %>% filter(drug %in% 
    coi) %>% group_by(screen, drug, line, KRAS) %>% summarise(auc = mean(s_AUC_actual)) %>% 
    mutate(goi = if_else(KRAS < 0.3, "wildtype", "mutated")) %>% 
    ggplot(aes(x = goi, y = auc)) + geom_boxplot() + geom_jitter(aes(color = line), 
    width = 0.1) + facet_wrap(~drug) + geom_signif(comparisons = list(c("wildtype", 
    "mutated")), test = "wilcox.test") + theme_classic()
## Joining, by = "line"

For the Tumor supressor p53 a AF cutoff of 70% is defined.

coi <- c("Nutlin3a")
auc_pgx %>% full_join(., mutations_freq_contig) %>% filter(drug %in% 
    coi) %>% group_by(screen, drug, line, TP53) %>% summarise(auc = mean(s_AUC_actual)) %>% 
    mutate(goi = if_else(TP53 < 0.7, "wildtype", "mutated")) %>% 
    ggplot(aes(x = goi, y = auc)) + geom_boxplot() + geom_jitter(aes(color = line), 
    width = 0.1) + facet_wrap(~drug) + geom_signif(comparisons = list(c("wildtype", 
    "mutated")), test = "wilcox.test") + theme_classic()  #+ 
## Joining, by = "line"

# RColorBrewer::brewer.pal(12, 'Set3')

For the oncogene PIK3CA a AF cutoff of 30% is defined.

coi <- c("Everolimus")
auc_pgx %>% full_join(., mutations_freq_contig) %>% filter(drug %in% 
    coi) %>% group_by(screen, drug, line, PIK3CA) %>% summarise(auc = mean(s_AUC_actual)) %>% 
    mutate(goi = if_else(PIK3CA < 0.3, "wildtype", "mutated")) %>% 
    ggplot(aes(x = goi, y = auc)) + geom_boxplot() + geom_jitter(aes(color = line), 
    width = 0.1) + facet_wrap(~drug) + geom_signif(comparisons = list(c("wildtype", 
    "mutated")), test = "wilcox.test") + theme_classic()  #+ 
## Joining, by = "line"

# RColorBrewer::brewer.pal(12, 'Set3')

3.2.1 Evaluate growth-rate adjusted compound activity

Next to quantile normalization, two recent papers suggest growth rate adjusted auc calculation to account for differences in doubling times of cell lines. First I load growth rate data from separate experiments and perform first annotation of the dataset. I filter the proliferation dataset further by removing empty wells.

# define datasets to load ctg data into R
filelist <- list.files("~/promise/rsync_data/ctg_data/proliferation_true", 
    pattern = "_Prolif_", recursive = TRUE, full.names = TRUE)
# define a function to load proliferation ctg files into R
# once they match the given definitions
load_delim <- function(full.name) {
    read_delim(full.name, "\t", escape_double = FALSE, col_names = FALSE, 
        trim_ws = TRUE) %>% mutate(plate_name = full.name %>% 
        str_split(., "/") %>% .[[1]] %>% .[length(.)] %>% str_sub(., 
        1, -5))
}
# load ctg data
ctg_prolif_raw <- lapply(filelist, load_delim) %>% bind_rows() %>% 
    `colnames<-`(c("original_name", "well_id_384", "photons", 
        "plate_barcode")) %>% # ugly processing -it works
separate(plate_barcode, c("tmp.plate_barcode", "tmp.mu"), sep = -7, 
    remove = FALSE) %>% mutate(tmp.mu = substr(tmp.mu, 2, 6)) %>% 
    separate(tmp.mu, c("mithras", "user"), sep = "_") %>% separate(tmp.plate_barcode, 
    c("date", "experiment", "timepoint", "lines"), sep = "_", 
    extra = "merge") %>% mutate(no_lines = str_count(lines, "D0")) %>% 
    separate(well_id_384, c("letter", "number"), sep = 1, remove = FALSE) %>% 
    # data from seeding timepoints was generated in 96 well
# plates. For now it will be removed from the dataset to
# enable standardized analysis
filter(!timepoint %in% c("seeding")) %>% # on three dates (170613, 170724 and 171010) not the complete
# plate was filled with organoids. Here the unseeded wells
# are removed from the dataset.
filter(!(date %in% c("170613") & number %in% 13:24)) %>% filter(!(date %in% 
    c("170724") & number %in% 21:24)) %>% filter(!(date %in% 
    c("171010") & number %in% 19:24)) %>% # one single well in the dataset has a photon count of
# exactly 0 (second lowest value being 658). It is set to NA
mutate(photons = replace(photons, which(photons == 0), NA))
# the data has the following distribution ctg_prolif_raw %>%
# ggplot(aes(photons)) + geom_density(adjust = 0.3) +
# scale_x_continuous(limits = c(0,50000)) + theme_minimal() +
# ggtitle('Distribution of Photon counts for all
# Proliferation Experiments')

I complete the annotation of the dataset by linking organoid lines to photon counts and time.

#get an overview of the lines and timepoints available. On each seeding-date there was only one set of lines processed and the plate-layout did not change between measurments.
#define a function to annotate a dataset for a defined seeding-date
annotate_prolif <- function(df){
  tmp.lines <- df %>% 
    dplyr::select(lines) %>%
    .$lines %>% #turn output into a string
    unique() %>%
    str_split(., "_") %>%
    unlist()
  
  tmp.anno <- tibble(number = df %>% .$number %>%  unique(),
                     ratio = length(number)/length(tmp.lines),
                     anno_line = rep(tmp.lines, each = ratio),
                     no_lines = df %>% .$no_lines %>% unique(),
                      #for control reasons the product of the columsn per line and the number of lines is calculated
                     anno_number = no_lines*ratio) 
return(tmp.anno)
}

#redefine and annotate the ctg_prolif dataset
ctg_prolif <- ctg_prolif_raw %>%
  #filter(date == "171010") %>%
  group_by(date) %>%
  do(annotate_prolif(.)) %>%
  merge(ctg_prolif_raw, ., by = c("date", "number", "no_lines")) %>%
  filter(!(number %in% c(1,24) | letter %in% c("A", "P"))) %>%
  #wells loacted on the edge of the plates are removed. However, this filtration step has no impact on the median photon count per plate.  
  mutate(timepoint_num = substr(timepoint,2,2) %>% as.numeric(),
         photons_log2 = log2(photons)) %>%
  filter(!(date %in% c("170515", "170606"))) %>%
  filter((date != "170711") | (anno_line != "D015T01")) %>%
  #seeding-dates that contained a protocol-deviation are removed from the dataset
  dplyr::rename(line = anno_line) 

I gain a first overview about the proliferation dataset. Proliferation data for lines D015T01 and D030T01 appear discordant. Proliferation data for D019T01, D021T01 and D022T01 appear to follow a linear growth pattern

ctg_prolif %>% group_by(date, timepoint_num, line) %>% summarise(median = median(photons, 
    na.rm = TRUE), mad = mad(photons, na.rm = TRUE)) %>% ggplot(aes(timepoint_num, 
    median, color = date)) + geom_point() + geom_path(aes(group = date)) + 
    facet_wrap(~line) + theme_minimal() + ggtitle("Proliferation curves for 12 organoid lines") + 
    ylab("Median photon count for each timepoint") + xlab("Time (d)")

I like to estimate the growth rate of each line in the initial phase of the prolifereation experiment. During this phase all lines show a reproducible growth rate after photon counts have been log2 transformed.

ctg_prolif %>% dplyr::select(timepoint_num, date, line, photons, 
    photons_log2) %>% group_by(date, timepoint_num, line) %>% 
    ggplot(aes(timepoint_num, photons_log2, group = date)) + 
    geom_jitter(alpha = 0.2, width = 0.2) + stat_growthcurve(model = "linear") + 
    facet_wrap(~line) + theme_minimal() + scale_x_continuous(limits = c(2, 
    4), breaks = c(2, 4)) + ggtitle("Proliferation curves for 12 organoid lines over 4 days") + 
    ylab("Log2 of median photon count for each timepoint") + 
    xlab("Time (d)")  # + 

# geom_smooth(alpha = 0.5)

Now I used the transformed photon counts to estimate a doubling rate by fitting linear models to each dataset (line and replicate).

#if multiple timepoints are estimated only these cell lines can be used, since they show log2 linear growth over all measured timepoints
picks <- c("D004T01", "D019T01", "D021T01", "D022T01")
#a classic formula for growth rate calulcation
calc <- ctg_prolif %>% 
  dplyr::select(timepoint_num, date, line, photons) %>%
  filter(timepoint_num %in% c(2,4)) %>% #there is no difference when starting the model at t=2/t=0
  group_by(line, date) %>%
  mutate(timepoint_num = timepoint_num - 2) %>%
  group_by(line, date, timepoint_num) %>% 
  summarise(median.count = median(photons)) %>%
  ungroup() %>% 
  spread(timepoint_num, median.count) %>%
  mutate(gr = log(`2`/`0`)/2,
         dt = log(2)/gr)

I use the fitted data to estimate the treatment date (day 3) of the screen. With this data at hand we can calculate growth rate corrected AOC and IC50 values. Moreover, I tested that the estimation of doubling time is not considerably improved if positional effects are respected and every well is handled as a separate experiment.

ctg_gr <- ctg_loess %>% 
  mutate(time = 168) %>%
  dplyr::rename(concentration = rack.concentration,
                photons = pcount) %>%
   ## separate annotation of screen name 
  separate(screen, c("date", "operator", "mithras", "full_barcode"), remove = FALSE) %>%
  separate(full_barcode, c("line", "plate", "library"), remove = FALSE, sep = c(7, 11)) %>%
  dplyr::select(full_barcode, line, drug, time, concentration, photons) %>% 
  full_join(fit %>% dplyr::select(line, time, photons, drug, full_barcode)) %>%
  #only lines which were still in a log phase of growth on the measured timepoints can be used for estimating the needed timepoints t=3 and t=7
  #filter(line %in% picks) %>%
  filter(line != "D015T01") %>%
  dplyr::rename(cell_count = photons,
                cell_line = line,
                treatment = drug) %>%
  as.tibble() %>%
  arrange(time)

# ctg_gr %>%
#   filter(cell_line %in% picks & treatment == "DMSO" & time == 168) %>%
#   ggplot(aes(full_barcode, cell_count)) +
#   geom_boxplot() +
#   facet_grid(~cell_line) + 
#   theme_minimal() + 
#   theme(axis.title.x=element_blank(),
#         axis.text.x=element_blank(),
#         axis.ticks.x=element_blank()) + 
#   ggtitle("Screen replicates and estimated viability at day 7") + 
#   ylab("Photon counts for DMSO treated wells")

drc_output <- ctg_gr %>% 
  dplyr::select(-full_barcode) %>%
  mutate(concentration = if_else(treatment == "DMSO", "0", concentration),
         treatment = if_else(treatment == "DMSO", "-", treatment),
         concentration = as.numeric(concentration)) %>%
  mutate(time = time - 72) %>%
  GRfit(., case = "C", groupingVariables = c('cell_line','treatment'))
GRdrawDRC(drc_output, experiments = c(paste0(ctg_gr$cell_line %>% unique(), " ", c("D010T01 Gefitinib", "D027T01 Gefitinib"))))

GRgetMetrics(drc_output) %>% 
  filter(pval_GR < 1) %>%
  dplyr::select(cell_line, treatment, GR_AOC) %>%
  as.tibble() %>%
  ungroup() %>%
  dplyr::rename(line = cell_line,
         drug = treatment, 
         aoc.mean = GR_AOC) %>%
  filter(!drug %in% c( "DMSO")) %>%
  #filter(!line %in% c("D015T01", "D010T01")) %>%
  #filter(!grepl(.$drug, pattern = 'mit')) %>%
  spread(line, aoc.mean) %>%
  na.omit() %>%
  as.data.frame() %>%
  remove_rownames() %>%
  column_to_rownames('drug') %>%
  pheatmap( 
  #scale = "row",
  cluster_rows = TRUE,
  show_rownames = TRUE,
  show_colnames = TRUE,
  color = colorRampPalette(c('#ca0020','#f4a582','#f7f7f7', '#92c5de','#0571b0'))(5),
  #color = colorRampPalette(c('#f9f902','#000000','#3701f9'))(10),
  #color = c('#ca0020','#f4a582','#f7f7f7','#92c5de','#0571b0'),
  annotation_col = col,
  annotation_colors = ann_colors,
  #annotation_row = row
  #cutree_cols = 2,
  cutree_rows = 4,
  na_col = "grey"
  ) 

Compare AUC by the GRmetric package with the PharmacoGx analysis by correlating responses. First fit AUC data irrespective of timepoints.

compare_methods <- GRgetMetrics(drc_output) %>% as.tibble() %>% 
    ungroup() %>% dplyr::rename(line = cell_line, drug = treatment, 
    aoc.mean = GR_AOC) %>% dplyr::select(line, drug, aoc.mean) %>% 
    mutate(aoc.mean.scale = )
left_join(auc_pgx %>% dplyr::select(line, drug, l_AUC_fit, l_AUC_actual, 
    s_AUC_fit, s_AUC_actual) %>% distinct(line, drug, .keep_all = TRUE))
compare_methods %>% ggplot(aes(l_AUC_fit, l_AUC_actual)) + geom_point() + 
    theme_classic()
compare_methods %>% ggplot(aes(s_AUC_fit, l_AUC_fit)) + geom_point() + 
    theme_classic()
compare_methods %>% ggplot(aes(aoc.mean, l_AUC_fit)) + geom_point() + 
    theme_classic()

3.3 Illustrate differences in growth rates

For future analysis it might be interesting to know the growth rate of each line as an additional feature. I will calculate a growth rate and attach it to the annotation data which we have previously defined for each organoid line.

calc %>%
  group_by(line) %>% 
  summarise(m_mean = mean(dt)) %>%
  arrange((m_mean)) %>%
  ungroup(line) %>%
  mutate(line = factor(line, levels = line %>% unique())) %>%
ggplot() + 
  geom_col(aes(x=line, y=m_mean), fill="black", stat = "identity",  alpha = 0.9) + #"tomato2"
  geom_point(data = calc %>%
  group_by(line) %>% 
  mutate(m_mean = mean(dt)) %>%
  arrange((m_mean)) %>%
  ungroup(line) %>%
  mutate(line = factor(line, levels = line %>% unique())),
  aes(x=line, y=dt), col="grey", size=2, alpha = 1) +   # Draw points
  # geom_point(aes(x=line, y=m_mean, col = date), size=5, alpha = 0.9) + #"tomato2"
  # geom_point(aes(x=line, y=m, col = date), size=2, alpha = 0.9) +   # Draw points
  # geom_segment(aes(x=line, 
  #                  xend=line, 
  #                  y=min(m), 
  #                  yend=max(m)), 
  #              linetype="dashed", 
  #              size=0.1) +   # Draw dashed lines
  labs(title="Organoid Doubling Time", 
       subtitle="Cell doubling time was estimated inbetween 48 and 96 hours after seeding") +  #caption
  coord_flip() + 
  theme_classic() + 
  xlab("Organoid line") + 
  ylab("Doubling times [days]") + 
  scale_y_log10(breaks = c(1:10))

compound_of_interest <- c("Trifluoridin/Tipiracil")
dodge <- position_dodge(width = 0.15)
ctg_qnorm %>% separate(screen, c("date", "operator", "mithras", 
    "full_barcode"), remove = FALSE) %>% separate(full_barcode, 
    c("line", "plate", "library"), remove = FALSE, sep = c(7, 
        11)) %>% filter(drug %in% compound_of_interest) %>% mutate(line = as.factor(line), 
    drug = factor(drug, levels = compound_of_interest), concentration_factor = as.numeric(rack.concentration)) %>% 
    group_by(line, drug, concentration_factor) %>% dplyr::summarise(mean_photons = mean(pc_norm, 
    na.rm = TRUE), sd_photons = sd(pc_norm, na.rm = TRUE), range_low_photons = range(pc_norm)[1], 
    range_high_photons = range(pc_norm)[2]) %>% ggplot(aes(y = mean_photons, 
    x = concentration_factor)) + geom_point(position = dodge, 
    stat = "identity", aes(colour = line), size = 2) + geom_path(aes(colour = line), 
    alpha = 0.6, position = dodge, size = 2) + # geom_errorbar(aes(ymin=mean_photons-sd_photons,
# ymax=mean_photons+sd_photons, group = line), width=0,
# position = dodge, size = 1, alpha = 0.5) + geom_ribbon(
# aes(x=concentration_factor, y=mean_photons,
# ymin=mean_photons-sd_photons,
# ymax=mean_photons+sd_photons), alpha=0.2) +
scale_x_log10(breaks = c(7, 14)) + ylab("Photon count normalized to control") + 
    facet_wrap(~drug) + xlab("Concentration factor 5-fold, mean of n=2") + 
    scale_colour_brewer(palette = "Set3") + theme_minimal()

4 Session info

sessionInfo()
## R version 3.4.0 (2017-04-21)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.13.1
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] drc_3.0-1                  MASS_7.3-47               
##  [3] ggsignif_0.4.0             bindrcpp_0.2              
##  [5] readxl_1.0.0               GRmetrics_1.2.0           
##  [7] SummarizedExperiment_1.6.5 DelayedArray_0.2.7        
##  [9] matrixStats_0.52.2         Biobase_2.36.2            
## [11] GenomicRanges_1.28.6       GenomeInfoDb_1.12.3       
## [13] IRanges_2.10.5             S4Vectors_0.14.7          
## [15] BiocGenerics_0.22.1        stringr_1.2.0             
## [17] preprocessCore_1.38.1      growthcurve_1.0.0.9000    
## [19] PharmacoGx_1.6.1           pheatmap_1.0.8            
## [21] dplyr_0.7.4                purrr_0.2.4               
## [23] readr_1.1.1                tidyr_0.7.2               
## [25] tibble_1.3.4               ggplot2_2.2.1.9000        
## [27] tidyverse_1.1.1            knitr_1.17                
## [29] BiocStyle_2.4.1           
## 
## loaded via a namespace (and not attached):
##   [1] TH.data_1.0-8           fgsea_1.2.1             minqa_1.2.4            
##   [4] colorspace_1.3-2        rprojroot_1.2           lsa_0.73.1             
##   [7] XVector_0.16.0          MatrixModels_0.4-1      SnowballC_0.5.1        
##  [10] mvtnorm_1.0-6           lubridate_1.7.1         xml2_1.1.1             
##  [13] codetools_0.2-15        splines_3.4.0           mnormt_1.5-5           
##  [16] jsonlite_1.5            nloptr_1.0.4            magicaxis_2.0.3        
##  [19] pbkrtest_0.4-7          broom_0.4.2             cluster_2.0.6          
##  [22] mapproj_1.2-5           compiler_3.4.0          httr_1.3.1             
##  [25] backports_1.1.1         assertthat_0.2.0        Matrix_1.2-11          
##  [28] lazyeval_0.2.1          limma_3.32.10           formatR_1.5            
##  [31] quantreg_5.34           htmltools_0.3.6         tools_3.4.0            
##  [34] igraph_1.1.2            gtable_0.2.0            glue_1.2.0             
##  [37] GenomeInfoDbData_0.99.0 RANN_2.5.1              reshape2_1.4.2         
##  [40] maps_3.2.0              fastmatch_1.1-0         Rcpp_0.12.13           
##  [43] slam_0.1-40             cellranger_1.1.0        gdata_2.18.0           
##  [46] nlme_3.1-131            psych_1.7.8             lme4_1.1-14            
##  [49] rvest_0.3.2             gtools_3.5.0            zoo_1.8-0              
##  [52] zlibbioc_1.22.0         scales_0.5.0.9000       hms_0.3                
##  [55] relations_0.6-7         sandwich_2.4-0          SparseM_1.77           
##  [58] RColorBrewer_1.1-2      sets_1.0-17             yaml_2.1.14            
##  [61] gridExtra_2.3           downloader_0.4          stringi_1.1.5          
##  [64] NISTunits_1.0.1         plotrix_3.6-6           caTools_1.17.1         
##  [67] BiocParallel_1.10.1     rlang_0.1.4             pkgconfig_2.0.1        
##  [70] bitops_1.0-6            pracma_2.0.7            evaluate_0.10.1        
##  [73] lattice_0.20-35         bindr_0.1               labeling_0.3           
##  [76] htmlwidgets_0.9         tidyselect_0.2.3        plyr_1.8.4             
##  [79] magrittr_1.5            bookdown_0.5            R6_2.2.2               
##  [82] gplots_3.0.1            multcomp_1.4-7          sm_2.2-5.4             
##  [85] withr_2.1.0             mgcv_1.8-22             haven_1.1.0            
##  [88] foreign_0.8-69          survival_2.41-3         RCurl_1.95-4.8         
##  [91] nnet_7.3-12             modelr_0.1.1            car_2.1-5              
##  [94] KernSmooth_2.23-15      plotly_4.7.1            rmarkdown_1.6          
##  [97] grid_3.4.0              data.table_1.10.4-3     marray_1.54.0          
## [100] forcats_0.2.0           piano_1.16.4            digest_0.6.12          
## [103] munsell_0.4.3           celestial_1.4.1         viridisLite_0.2.0      
## [106] quadprog_1.5-5